2 Graphical Description of Data

In chapter 1, you were introduced to the concepts of population, which again is a collection of all the measurements from the individuals of interest. Remember, in most cases you can’t collect the entire population, so you have to take a sample. Thus, you collect data either through a sample or a census. Now you have a large number of data values. What can you do with them? No one likes to look at just a set of numbers. One thing is to organize the data into a table or graph. Ultimately though, you want to be able to use that graph to interpret the data, to describe the distribution of the data set, and to explore different characteristics of the data. The characteristics that will be discussed in this chapter and the next chapter are:

Center: middle of the data set, also known as the average.
Variation: how much the data varies.
Distribution: shape of the data (symmetric, uniform, or skewed).
Qualitative data: analysis of the data
Outliers: data values that are far from the majority of the data.
Time: changing characteristics of the data over time.

This chapter will focus mostly on using the graphs to understand aspects of the data, and not as much on how to create the graphs. There is technology that will create most of the graphs, though it is important for you to understand the basics of how to create them.

This textbook uses RStudio to perform all graphical and descriptive statistics, and all statistical inference. When using RStudio, every command is performed the same way. You start off with a goal(explanatory variable ~ response variable, data=data frame_name,…)

RStudio uses packages to make calculations easier. For this textbook, you will mostly need the package mosaic. There will be others that you will need on occasion, but you will be told that at the time. Most likely, mosaic is already installed in your RStudio. If you wish to install other packages you use the command

install.packages(“name of package”)

where you replace the name of package with the package you wish to install.

Once the package is installed, then you will need to tell RStudio you want to use it every time you start RStudio. The command to tell RStudio you want to use a package is

library(“name of package”)

You will need to turn on the package mosaic. The NHANES package contains a data frame that is useful. Both are accessed by running the command library(“name of package”).

Back to the basic command

goal(explanatory variable ~ response variable, data=data frame_name,…)

The goal depends on what you want to do. If you want to create a graph then you would need

gf_graph_type(explanatory_variable ~ response_variable, data=data_frame_name, …)

As an example if you want to create a density plot of cholesterol levels on day 2 from a data frame called Cholesterol, then your command would be

gf_density(~day2, data=Cholesterol)

You will see more on what the different commands are that you would use. A word about the … at the end of the command. That means there are other things you can do, but that is up to you if you want to actually do them. They do not need to be used if you don’t want to. The following sections will show you how to create the different graphs that are usually completed in an introductory statistics course.

2.1 Qualitative Data

Remember, qualitative data are words describing a characteristic of the individual. There are several different graphs that are used for qualitative data. These graphs include bar graphs, Pareto charts, and pie charts. Bar graphs can be created using a statistical program like RStudio.

Bar graphs or charts consist of the frequencies on one axis and the categories on the other axis. Drawing the bar graph using r is performed using the following command.

gf_bar(~explanatory variable, data=Dataframe)

2.1.1 Example: Drawing a Bar Chart

Data was collected for two semesters in a statistics class. The data frame is in Table 2.1. The command

head(data frame)

shows the variables and the first few lines of the data set. The data sets are usually larger than what is shown. The head command allows one to see the structure of the data frame.

Class<-read.csv( "https://krkozak.github.io/MAT160/class_survey.csv") 
knitr::kable(head(Class))

Table 2.1: Head of Statistics Class Survey
vehicle	gender	distance_campus	ice_cream	rent	major	height	winter
None	Female	1.5	Cookie Dough	724	Environmental and Sustainability Studies	61	Liked it
Mercury	Female	14.7	Sherbet	200	Administrative Justice	60	Don’t like it
Ford	Female	2.4	Chocolate Brownie.	600	Bio Chem	68	Liked it
Toyota	Female	5.2	coffee	0		66	Loved it
Jeep	Male	2.0	Cookie Dough	600	Pre-health Careers	71	Loved it
Subaru	Male	5.0	none	500	Finance	72	No opinion

Every data frame has a code book that describes the data set, the source of the data set, and a listing and description of the variables in the data frame.

Code book for data frame class

Description Survey results from two semesters of statistics classes at Coconino Community College in the years 2018-2019.

Format

This data frame contains the following columns:

vehicle: Type of car a student drives

gender: Self declared gender of a student

distance_campus: how far a student lives from the Lone Tree Campus of Coconino Community College (miles)

ice_cream: favorite ice cream flavor

rent: How much a student pays in rent

major: Students declared major

height: height of the student (inches)

winter: Student’s opinion of winter (Love it, Like it, Don’t like, No opinion)

Source

Kozak K (2019). Survey results form surveys collected in statistics class at Coconino Community College.

References

Kozak, 2019

Create a bar graph of vehicle type. To do this in RStudio, use the command

gf_bar(~variable, data=Data_Frame, …)

where gf_bar is the goal, vehicle is the name of the response variable (there is no explanatory variable), the data frame is Class, and a title was added to the graph.

2.1.1.1 Solution

gf_bar(~vehicle, data=Class, title="Bar Chart of Cars driven by students in statistics class", xlab="Vehicle")

See the description of the graph below the graph — Figure 2.1: Cars driven by students in statistics class

Description of Figure 2.1 is a Bar graph with bars for Audi, Buick, Honda, Hyundai, Mercury, Nissan with height of 1, Dodge and None with height of 2, Jeep, Subaru, Toyota with heights of 3, and Chevrolet and Ford at height of 4.

Notice from Figure 2.1, you can see that Chevrolet and Ford are the more popular car, with Jeep, Subaru, and Toyota not far behind. Many types seems to be the lesser used, and tied for last place. However, more data would help to figure this out.

All graphs should have labels on each axis and a title for the graph.

The beauty of data frames with multiple variables is that you can answer many questions from the data. Suppose you want to see if gender makes a difference for the type of car a person drives. If you are a car manufacturer, if you knew that certain genders like certain cars, then you would advertise to the different genders. To create a bar graph that separates based on gender, perform the following command in RStudio.

gf_bar(~vehicle, fill=~gender, data=Class, title="Cars driving by students in statistics class",xlab="Vehicle", position=position_dodge())

Description of Figure 2.2 is a bar graph of number of vehicles separated by female and male. Audi and male has height of 1, Buick and female has a height of 1, Chevrolet and male and Chevrolet and female have heights of 2, Dodge and male and Dodge and female has heights of 1, Ford and female has a height of 4, Honda and female has a height of 1, Hyundai and male has a height of 1, Jeep and male has a height of 2 while Jeep and female has a height of 1, Mercury and female has a height of 1, Nissan and female has a height of 1, no car and female has a height of 2, Subaru and female has a height of 1, Subaru and male has a height of 2, Toyota and female has a height of 1, and Toyota and male has a height of 2.

Notice a Ford is driven by females more than any other car, while Chevrolet, Mercury, and Subaru cars are equally driven by males. Obviously a larger sample would be needed to make any conclusions from this data.

There are other types of graphs that can be created for quantitative variables. Another type is known as a dot plot. The command for this graph is as follows.

gf_dotplot(~vehicle, data=Class, title="Cars driven by students in statistics class", xlab="Vehicle")

Description of Figure 2.8 is a dot plot of number of vehicles with Audi, Buick, Honda, Hyundai, Mercury, Nissan with height of 1, Dodge and None with height of 2, Jeep, Subaru, Toyota with heights of 3, and Chevrolet and Ford at height of 4. Very similar to bar graph.

Notice a dot plot is like a bar chart. Both give you the same information. You can also divide a dot plot by gender.

Another type of graph that is also useful and similar to the dot plot is a point plot (scatter plot). In this plot you can graph the explanatory variable versus the response variable. The command for this in rStudio is as follows.

gf_point(vehicle~gender, data=Class, title="Cars driving by students in statistics class", xlab="Gender", ylab="Vehicle")

Description of Figure 2.4 is a scatter plot of type of vehicles separated by female and male with females owning Toyota, Subaru, none, Nissan, Mercury, Jeep, Honda, Ford, Dodge, Chevrolet, and Buick, while males own Toyota, Subaru, Jeep, Hyundai, Dodge, Chevrolet, and Audi.

The problem with Figure 2.4 is that if there are multiple females who drive a Ford, only one dot is shown. So it is best to spread the dots out using a plot known as a jitter plot. In a jitter plot the dots are randomly moved off the center line. The command for a jitter plot is as follows:

gf_jitter(vehicle~gender, data=Class, title="Cars driving by students in statistics class", xlab="Gender", ylab="Vehicle")

Description of Figure 2.5 is a jitter plot of number of vehicles separated by female and male with females owning 1 Toyota, 1 Subaru, 2 with none, 1 Nissan, 1 Mercury, 1 Jeep, 1 Honda, 4 Fords, 1 Dodge, 2 Chevrolets, and 1 Buick, while males own 2 Toyotas, 2 Subarus, 2 Jeeps, 1 Hyundai, 1 Dodge, 1 Chevrolets, and 1 Audi.

Now you can observe that there are 4 females who drive a Ford. There is one female who drives a Honda. Other information about other cars and genders can be seen better than in the point plot and the bar graph. Jitter plots are useful to see how many data values are for each qualitative data values.

There are many other types of graphs that can be used on qualitative data. There are spreadsheet software packages that will create most of them, and it is better to look at them to see how to create then. It depends on your data as to which may be useful, but the bar, dot, and jitter plots are really the most useful.

2.1.2 Homework for Qualitative Data Section

Eyeglassomatic manufactures eyeglasses for different retailers. The number of lenses for different activities is in Table 2.2.

Eyeglasses<-read.csv( "https://krkozak.github.io/MAT160/eyglasses.csv") 
knitr::kable(head(Eyeglasses))

Table 2.2: Head of Eyeglasses Data frame
activity
Grind
Grind
Grind
Grind
Grind
Grind

Code book for Data Frame Eyeglasses

Description Activities that an Eyeglass company performs when making eyeglasses, Grind means ground the lenses and put them in frames, multicoat means put tinting or coatings on lenses and then put them in frames, assemble means received frames and lenses from other sources and put them together, make frames means made the frames and put lenses in from other sources, receive finished means received glasses from other source unknown means do not know where the lenses came from.

Format

This data frame contains the following columns:

activity: The activity that is completed to make the eyeglasses by Eyeglassomatic

Source John Matic provided the data from a company he worked with. The company’s name is fictitious, but the data is from an actual company.

References John Matic (2013)

Make a bar chart of this data. State any findings you can see from the graph.

Data was collected for two semesters in a statistics class drive. The data frame is in Table 2.1.

Code book for the Data Frame Class is found below Table 2.1.

Create a bar graph of the variable ice cream. State any findings you can see from the graphs.

The number of deaths in the US due to carbon monoxide (CO) poisoning from generators from the years 1999 to 2011 are in Table 2.3 (Hinatov, 2012). Create a bar chart of this data. State any findings you see from the graph.

Area<-read.csv( "https://krkozak.github.io/MAT160/area.csv") 
knitr::kable(head(Area))

Table 2.3: Head of Area Data frame
deaths
Urban
Urban
Urban
Urban
Urban
Urban

Data was collected for two semesters in a statistics class drive. The data frame is in Table 2.1. Create a bar graph and dot plot of the variable major. Create a jitter plot of major and gender. State any findings you can see from the graphs.

Code book for the Data Frame Class is found below Table 2.1.
Eyeglassomatic manufactures eyeglasses for different retailers. They test to see how many defective lenses they made during the time period of January 1 to March 31. The table Table 2.4 gives the defect and the number of defects. Create a bar chart of the data and then describe what this tells you about what causes the most defects.

Defects<- read.csv( "https://krkozak.github.io/MAT160/defects.csv") 
knitr::kable(head(Defects))

Table 2.4: Head of Defects Data frame
type
small
small
pd
flaked
scratch
spot

Code book for Data Frame Defects

Description Types of defects that an Eyeglass company sees in the lenses they make into eyeglasses.

Format

This data frame contains the following columns:

type: The type of defect that is Seen when making eyeglasses by Eyeglassomatic

Source John Matic provided the data from a company he worked with. The company’s name is fictitious, but the data is from an actual company.

References John Matic (2013)

American National Health and Nutrition Examination (NHANES) surveys is collected every year by the US National Center for Health Statistics (NCHS). The data frame is in Table 2.5. Create a bar chart of MartialStatus. Create a jitter plot of MaritalStatus versus Education. Describe any findings from the graphs.

knitr::kable(head(NHANES))

Table 2.5: NHANES Data frame
ID	SurveyYr	Gender	Age	AgeDecade	AgeMonths	Race1	Race3	Education	MaritalStatus	HHIncome	HHIncomeMid	Poverty	HomeRooms	HomeOwn	Work	Weight	Length	HeadCirc	Height	BMI	BMICatUnder20yrs	BMI_WHO	Pulse	BPSysAve	BPDiaAve	BPSys1	BPDia1	BPSys2	BPDia2	BPSys3	BPDia3	Testosterone	DirectChol	TotChol	UrineVol1	UrineFlow1	UrineVol2	UrineFlow2	Diabetes	DiabetesAge	HealthGen	DaysPhysHlthBad	DaysMentHlthBad	LittleInterest	Depressed	nPregnancies	nBabies	Age1stBaby	SleepHrsNight	SleepTrouble	PhysActive	PhysActiveDays	TVHrsDay	CompHrsDay	TVHrsDayChild	CompHrsDayChild	Alcohol12PlusYr	AlcoholDay	AlcoholYear	SmokeNow	Smoke100	Smoke100n	SmokeAge	Marijuana	AgeFirstMarij	RegularMarij	AgeRegMarij	HardDrugs	SexEver	SexAge	SexNumPartnLife	SexNumPartYear	SameSex	SexOrientation	PregnantNow
51624	2009_10	male	34	30-39	409	White	NA	High School	Married	25000-34999	30000	1.36	6	Own	NotWorking	87.4	NA	NA	164.7	32.22	NA	30.0_plus	70	113	85	114	88	114	88	112	82	NA	1.29	3.49	352	NA	NA	NA	No	NA	Good	0	15	Most	Several	NA	NA	NA	4	Yes	No	NA	NA	NA	NA	NA	Yes	NA	0	No	Yes	Smoker	18	Yes	17	No	NA	Yes	Yes	16	8	1	No	Heterosexual	NA
51624	2009_10	male	34	30-39	409	White	NA	High School	Married	25000-34999	30000	1.36	6	Own	NotWorking	87.4	NA	NA	164.7	32.22	NA	30.0_plus	70	113	85	114	88	114	88	112	82	NA	1.29	3.49	352	NA	NA	NA	No	NA	Good	0	15	Most	Several	NA	NA	NA	4	Yes	No	NA	NA	NA	NA	NA	Yes	NA	0	No	Yes	Smoker	18	Yes	17	No	NA	Yes	Yes	16	8	1	No	Heterosexual	NA
51624	2009_10	male	34	30-39	409	White	NA	High School	Married	25000-34999	30000	1.36	6	Own	NotWorking	87.4	NA	NA	164.7	32.22	NA	30.0_plus	70	113	85	114	88	114	88	112	82	NA	1.29	3.49	352	NA	NA	NA	No	NA	Good	0	15	Most	Several	NA	NA	NA	4	Yes	No	NA	NA	NA	NA	NA	Yes	NA	0	No	Yes	Smoker	18	Yes	17	No	NA	Yes	Yes	16	8	1	No	Heterosexual	NA
51625	2009_10	male	4	0-9	49	Other	NA	NA	NA	20000-24999	22500	1.07	9	Own	NA	17.0	NA	NA	105.4	15.30	NA	12.0_18.5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	No	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
51630	2009_10	female	49	40-49	596	White	NA	Some College	LivePartner	35000-44999	40000	1.91	5	Rent	NotWorking	86.7	NA	NA	168.4	30.57	NA	30.0_plus	86	112	75	118	82	108	74	116	76	NA	1.16	6.70	77	0.094	NA	NA	No	NA	Good	0	10	Several	Several	2	2	27	8	Yes	No	NA	NA	NA	NA	NA	Yes	2	20	Yes	Yes	Smoker	38	Yes	18	No	NA	Yes	Yes	12	10	1	Yes	Heterosexual	NA
51638	2009_10	male	9	0-9	115	White	NA	NA	NA	75000-99999	87500	1.84	6	Rent	NA	29.8	NA	NA	133.1	16.82	NA	12.0_18.5	82	86	47	84	50	84	50	88	44	NA	1.34	4.86	123	1.538	NA	NA	No	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

To view the code book for NHANES, type help(“NHANES”) in rStudio after you load the NHANES packages using library(“NHANES”)

2.2 Quantitative Data

There are several different graphs for quantitative data. With quantitative data, you can talk about how the data is distributed, called a distribution. The shape of the distribution can be described from the graphs.

Histogram: a graph of frequencies (counts) on the vertical axis and classes on the horizontal axis. The height of the rectangles is the frequency and the width is the class width. The width depends on how many classes (bins) are in the histogram. The shape of a histogram is dependent on the number of bins. In RStudio the command to create a histogram is

gf_histogram(~response variable, data=Data_Frame, title=“title of the graph”)

The last part of the command puts a title on the graph. You type in what ever you want for the title in the quotes.

Density Plot: Similar to a histogram, except smoothing is created to smooth out the graph. The shape is not dependent on the number of bins so the distribution is easier to determine from the density plot. In RStudio the command to create a density plot is

gf_density(~response variable, data=Data_Frame, title=“title of the graph”, xlab=“Label”, ylab=“Label”)

The last part of the command puts a title on the graph and labels on the axes. You type in what every you want for the title and labels in the quotes.

2.2.1 Example: Drawing a Histogram and Density plot

Data was collected for two semesters in a statistics class drive. The data frame is in Table 2.1 and the code book is below the data frame

Draw a histogram, density plot, and a dot plot for the variable the distance a student lives from the Lone Tree Campus of Coconino Community College. Describe the story the graphs tell.

2.2.1.1 Solution

gf_histogram(~distance_campus, data=Class, title="Distance in miles from the Lone Tree Campus", xlab="Distance (miles)")

Description of the graph is histogram with high part on left and low part on right with several gaps. The graph contains bars.

gf_density(~distance_campus, data=Class, title="Distance in miles from the Lone Tree Campus", xlab="Distance (miles)")

Description of the graph is density graph with high part on left and low part on right with several gaps. The graph is smooth.

gf_dotplot(~distance_campus, data=Class, title="Distance in miles from the Lone Tree Campus", xlab="Distance (miles)")

Description of the graph of dot plot with high part on left and low part on right with several gaps. The graph is with dots that represent each data value.

Notice the histogram, density plot, and dot plot are all very similar, but the density plot is smoother. They all tell you similar ideas of the shape of the distribution. Reviewing the graphs you can see that most of the students live within 10 miles of the Lone Tree Campus, in fact most live within 5 miles from the campus. However, there is a student who lives around 50 miles from the Lone Tree Campus. This is a great deal farther from the rest of the data. This value could be considered an outlier. An outlier is a data value that is far from the rest of the values. It may be an unusual value or a mistake. It is a data value that should be investigated. In this case, the student lived really far from campus, thus the value is not a mistake, and is just very unusual. The density plot is probably the best plot for most data frames.

There are other aspects that can be discussed, but first some other concepts need to be introduced.

2.2.2 Shapes of the distribution:

When you look at a distribution, look at the basic shape. There are some basic shapes that are seen in histograms. Realize though that some distributions have no shape. The common shapes are symmetric, skewed, and uniform. Another interest is how many peaks a graph may have. This is known as modal.

Symmetric means that you can fold the graph in half down the middle and the two sides will line up. You can think of the two sides as being mirror images of each other. Skewed means one “tail” of the graph is longer than the other. The graph is skewed in the direction of the longer tail (backwards from what you would expect). A uniform graph has all the bars the same height.

Modal refers to the number of peaks. Unimodal has one peak and bimodal has two peaks. Usually if a graph has more than two peaks, the modal information is not longer of interest.

Other important features to consider are gaps between bars, a repetitive pattern, how spread out is the data, and where the center of the graph is.

2.2.3 Examples of graphs:

This graph is roughly symmetric and unimodal:

Graph: Symmetric Distribution

one side looks like other — symmetric Graph

This graph is symmetric and bimodal:

Graph: Symmetric and Bimodal Distribution

two bars high, and one side looks like other — Bimodal and symmetric graph

This graph is skewed to the right:

Graph: Skewed Right Distribution

small bars on right — Skewed right graph

This graph is skewed to the left and has a gap:

Graph: Skewed Left Distribution

This graph is uniform since all the bars are the same height:

Graph: Uniform Distribution

2.2.4 Example: Drawing a Histogram and Density plot

Data was collected from the Chronicle of Higher Education for tuition from public four year colleges, private four year colleges, and for profit four year colleges. The data frame is in Table 2.6. Draw a density plot of instate tuition levels for all four year institutions, and then separate the density plot for instate tuition based on type of institution. Describe any findings from the graph.

Tuition<-read.csv( "https://krkozak.github.io/MAT160/Tuition_4_year.csv") 
knitr::kable(head(Tuition))

Table 2.6: Head of Tuition Data Frame
INSTITUTION	TYPE	STATE	ROOM_BOARD	INSTATE_TUITION	INSTATE_TOTAL	OUTOFSTATE_TUITION	OUTOFSTATE_TOTAL
University of Alaska AnchoragePublic 4-year	Public_4 year	AK	12200	7688	19888	23858	36058
University of Alaska FairbanksPublic 4-year	Public_4 year	AK	8930	8087	17017	24257	33187
University of Alaska SoutheastPublic 4-year	Public_4 year	AK	9200	7092	16292	19404	28604
Alaska Bible CollegePrivate 4-year	Private_4_year	AK	5700	9300	15000	9300	15000
Alaska Pacific UniversityPrivate 4-year	Private_4_year	AK	7300	20830	28130	20830	28130
Alabama Agricultural and Mechanical UniversityPublic 4-year	Public_4 year	AL	8379	9698	18077	17918	26297

Code book for Data Frame Tuition

Description Cost of four year institutions.

Format

This data frame contains the following columns:

INSTITUTION: Name of four year institution

TYPE: Type of four year institution, Public_4_year, Private_4_year, For_profit_4_year.

STATE: What state the institution resides

ROOM_BOARD: The cost of room and board at the institution (\$)

INSTATE_TUTION: The cost of instate tuition (\$)

INSTATE_TOTAL: The cost of room and board and instate tuition (\$ per year)

OUTOFSTATE_TUTION: The cost of out of state tuition (\$ per year)

OUTOFSTATE_TOTAL: The cost of room and board and out of state tuition (\$ per year)

Source Tuition and Fees, 1998-99 Through 2018-19. (2018, December 31). Retrieved from https://www.chronicle.com/interactives/tuition-and-fees

References Chronicle of Higher Education *, December 31, 2018.

2.2.4.1 Solution

gf_density(~INSTATE_TUITION, data=Tuition, title="Instate Tuition at all Four Year institutions", xlab="Instate Tutition ($ per year)")

Description of the graph is a density with high part on left, then a dip and up to peak in the middle that is lower than the left peak and then the lowest peak on the right .

(ref:tuition-instate-type-cap) Density Plot for Instate Tuition Levels at all Four-Year Colleges

gf_density(~INSTATE_TUITION|TYPE, data=Tuition, title="Instate Tuition at all Four Year institions", xlab="Instate Tuition ($/year)")

Description of Figure 2.9 is a density plots separated by for profit 4 year with peak on left, private 4 year with peak in the middle, and public 4 year colleges with peak on the left. Public 4 year has the highest peak, with for profit 4 year is lower, and then private 4 year with the lowest peak.

The distribution is skewed right, with no gaps. Most institutions in state is less than \$ 20,000 per year though some go as high as \$ 60,000 per year. When separated by public versus private and for profit, most public are much less than \$ 20,000 per year while private four year cost around \$ 30,000 per year, and for profit are around \$ 20,000 per year.

There are other types of graphs for quantitative data. They will be explored in the next section.

2.2.5 Homework for Quantitative Data Section

The weekly median incomes of males and females for specific occupations, are given in Table 2.7 (CPS News Releases. (n.d.). Retrieved July 8, 2019, from https://www.bls.gov/cps/). Create a density plot for males and females. Discuss any findings from the graph. Note: to put two graphs on the same axis, type the piping symbol |> (base r) or %>% (magrittr package) (Note: |> and %>% are piping symbols that can be thought of as “and then”) at the end of the first command and then type the command for the second graph on the next line. Also, use fill=“pick a color” in the command to plot the graphs with different colors so the two graphs can be easier to distinguish.

Wages<- read.csv( "https://krkozak.github.io/MAT160/wages.csv") 
knitr::kable(head(Wages))

Table 2.7: Head of Wages Data frame
Occupation	Numworkers	median_wage	male_worker	male_wage	female_worker	female_wage
Management, professional, and related occupations	48808	1246	23685	1468	25123	1078
Management, business, and financial operations occupations	19863	1355	10668	1537	9195	1168
Management occupations	13477	1429	7754	1585	5724	1236
Chief executives	1098	2291	790	2488	307	1736
General and operations managers	939	1338	656	1427	283	1139
Legislators	14	NA	10	NA	4	NA

Code book for Data Frame Wages

Description Median weekly earnings of full-time wage and salary workers by detailed occupation and sex. The Current Population Survey (CPS) is a monthly survey of households conducted by the Bureau of Census for the Bureau of Labor Statistics. It provides a comprehensive body of data on the labor force, employment, unemployment, persons not in the labor force, hours of work, earnings, and other demographic and labor force characteristics.

Format

This data frame contains the following columns:

Occupation: Occupations of workers.

Numworkers: The number of workers in each occupation (in thousands of workers)

median_wage: Median weekly wage (\$)

male_worker: number of male workers (in thousands of workers)

male_wage: Median weekly wage of male workers (\$)

female_worker: number of female workers (in thousands of workers)

female_wage: Median weekly wage of female workers (\$)

Source CPS News Releases. (n.d.). Retrieved July 8, 2019, from https://www.bls.gov/cps/

References Current Population Survey (CPS) retrieved July 8, 2019.

The density of people per square kilometer for certain countries is in Table 2.8 (World Bank, 2019). Create density plot of density in 2018 for just Sub-Saharan Africa. Describe what story the graph tells.

Density<- read.csv( "https://krkozak.github.io/MAT160/density.csv") 
knitr::kable(head(Density))

Table 2.8: Head of Density Data frame
Country_Name	Country_Code	Region	IncomeGroup	y1961	y1962	y1963	y1964	y1965	y1966	y1967	y1968	y1969	y1970	y1971	y1972	y1973	y1974	y1975	y1976	y1977	y1978	y1979	y1980	y1981	y1982	y1983	y1984	y1985	y1986	y1987	y1988	y1989	y1990	y1991	y1992	y1993	y1994	y1995	y1996	y1997	y1998	y1999	y2000	y2001	y2002	y2003	y2004	y2005	y2006	y2007	y2008	y2009	y2010	y2011	y2012	y2013	y2014	y2015	y2016	y2017	y2018
Aruba	ABW	Latin America & Caribbean	High income	307.988889	312.361111	314.972222	316.844444	318.666667	320.638889	322.527778	324.366667	326.255556	328.127778	330.222222	332.444444	334.683333	336.266667	336.983333	336.588889	335.366667	333.905556	333.222222	333.866667	336.483333	340.805556	345.561111	349.088889	350.144444	348.022222	343.516667	339.327778	339.066667	345.272222	359.011111	379.08333	402.80000	426.11111	446.24444	462.22222	474.72778	484.87222	494.47222	504.73889	516.10000	527.73333	538.98333	548.53889	555.72778	560.18889	562.34444	563.10000	563.63889	564.82778	566.92222	569.77778	573.10556	576.52222	579.67222	582.62222	585.36667	588.02778
Afghanistan	AFG	South Asia	Low income	14.044987	14.323808	14.617537	14.926295	15.250314	15.585020	15.929795	16.293023	16.686236	17.114913	17.577191	18.060863	18.547565	19.013188	19.436265	19.825220	20.174779	20.435006	20.542009	20.458461	20.175341	19.732451	19.204316	18.693582	18.286015	17.976563	17.774920	17.795553	18.179820	19.012205	20.370396	22.18783	24.22664	26.15527	27.74049	28.87822	29.64973	30.23277	30.89612	31.82911	33.09590	34.61810	36.27251	37.87440	39.29522	40.48808	41.51049	42.46282	43.49296	44.70408	46.13150	47.73056	49.42804	51.11478	52.71207	54.19711	55.59599	56.93776
Angola	AGO	Sub-Saharan Africa	Lower middle income	4.436891	4.498708	4.555593	4.600180	4.628676	4.637213	4.631622	4.629544	4.654892	4.724765	4.845414	5.012073	5.211328	5.423422	5.634074	5.839022	6.042941	6.249063	6.463517	6.690695	6.930654	7.181319	7.442124	7.712163	7.990693	8.277943	8.574036	8.877878	9.188078	9.503799	9.825059	10.15270	10.48773	10.83159	11.18570	11.55107	11.92875	12.32021	12.72709	13.15110	13.59249	14.05263	14.53556	15.04624	15.58803	16.16259	16.76856	17.40245	18.05910	18.73446	19.42782	20.13951	20.86771	21.61047	22.36655	23.13506	23.91654	24.71305
Albania	ALB	Europe & Central Asia	Upper middle income	60.576642	62.456898	64.329234	66.209307	68.058066	69.874927	71.737153	73.805548	75.974270	77.937190	79.848650	81.865912	83.823066	85.770949	87.767555	89.727226	91.735255	93.659343	95.541314	97.518139	99.491095	101.615985	103.794161	106.001058	108.202993	110.315146	112.540329	114.683796	117.808139	119.946788	119.225912	118.50507	117.78420	117.06336	116.34248	115.62164	114.90077	114.17993	113.45905	112.73821	111.68515	111.35073	110.93489	110.47223	109.90828	109.21704	108.39478	107.56620	106.84376	106.31463	106.02901	105.85405	105.66029	105.44175	105.13515	104.96719	104.87069	104.61226
Andorra	AND	Europe & Central Asia	High income	30.585106	32.702128	34.919149	37.168085	39.465957	41.802128	44.165957	46.574468	49.059574	51.651064	54.380851	57.217021	60.068085	62.808511	65.329787	67.610638	69.725532	71.780851	74.080851	76.738298	79.787234	83.221277	86.951064	90.863830	94.893617	98.972340	103.095745	107.306383	111.591489	115.976596	120.576596	125.29362	129.72553	133.35532	135.85106	136.93617	136.86596	136.47234	136.95745	139.12766	143.27872	149.04043	155.70638	162.22128	167.80213	172.32553	175.92340	178.42979	179.70851	179.67872	178.18511	175.37660	171.85957	168.53830	165.98085	164.46170	163.83191	163.84255
Arab World	ARB			8.430860	8.663154	8.903441	9.152526	9.410965	9.679951	9.959490	10.247580	10.541383	10.839409	11.140162	11.445801	11.762925	12.100336	12.464221	12.856964	13.276051	13.716559	14.171137	14.634158	15.103942	15.581254	16.065812	16.557944	17.057705	17.563945	18.075438	18.592082	19.114029	19.817110	20.358106	20.73408	21.29364	21.84602	22.52760	23.05216	23.57027	24.08237	24.60020	25.12980	25.67166	26.22642	26.80081	27.40153	28.03371	28.69994	29.39751	30.11889	30.85858	31.59402	32.33012	33.06767	33.80379	34.53398	35.25690	35.96876	36.66980	37.37237

Code book for Data Frame Density

Description Population density of all countries in the world

Format

This data frame contains the following columns:

Country_Name: The name of countries or regions around the world

Country_Code: The 3 letter code for a country or region

Region: World Banks classification of where the country is in the world

Incomegroup: World Banks classification of what income level the country is considered to be

y1961-y2018: population density for the years 1961 through 2018, people per sq. km of land area, population density is midyear population divided by land area in square kilometers. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship–except for refugees not permanently settled in the country of asylum, who are generally considered part of the population of their country of origin. Land area is a country’s total area, excluding area under inland water bodies, national claims to continental shelf, and exclusive economic zones. In most cases the definition of inland water bodies includes major rivers and lakes.

Source Population density (people per sq. km of land area). (n.d.). Retrieved July 9, 2019, from https://data.worldbank.org/indicator/EN.POP.DNST

References Food and Agriculture Organization and World Bank population estimates.

Since the Density data frame is for all countries, a new data frame must be created with just Sub-Saharan Africa Table 2.9. This is created by using the following command

Africa <- Density |> 
  filter(Region == "Sub-Saharan Africa") 
knitr::kable(head(Africa))

Table 2.9: Head of Africa Data frame
Country_Name	Country_Code	Region	IncomeGroup	y1961	y1962	y1963	y1964	y1965	y1966	y1967	y1968	y1969	y1970	y1971	y1972	y1973	y1974	y1975	y1976	y1977	y1978	y1979	y1980	y1981	y1982	y1983	y1984	y1985	y1986	y1987	y1988	y1989	y1990	y1991	y1992	y1993	y1994	y1995	y1996	y1997	y1998	y1999	y2000	y2001	y2002	y2003	y2004	y2005	y2006	y2007	y2008	y2009	y2010	y2011	y2012	y2013	y2014	y2015	y2016	y2017	y2018
Angola	AGO	Sub-Saharan Africa	Lower middle income	4.4368910	4.4987078	4.5555932	4.6001797	4.6286757	4.637213	4.631622	4.629544	4.654892	4.724765	4.845414	5.012073	5.211328	5.423422	5.634074	5.839022	6.042941	6.249063	6.463517	6.690695	6.930654	7.181319	7.442124	7.712163	7.990693	8.277943	8.574036	8.877878	9.188078	9.503799	9.825059	10.152696	10.487727	10.831593	11.185695	11.551070	11.928748	12.320206	12.727095	13.151097	13.592487	14.052633	14.535557	15.046238	15.588034	16.162590	16.768559	17.402450	18.059101	18.734456	19.427818	20.139513	20.867715	21.610475	22.366553	23.135064	23.916538	24.713052
Burundi	BDI	Sub-Saharan Africa	Low income	111.0762461	113.2134346	115.4371885	117.8461838	120.4976246	123.461449	126.682944	129.942640	132.940187	135.477959	137.460942	139.005685	140.386527	141.994977	144.115265	146.840771	150.095210	153.787617	157.758333	161.888551	166.141744	170.550000	175.137578	179.949494	185.001441	190.293731	195.760826	201.273287	206.661565	211.797391	216.702726	221.400506	225.780880	229.710553	233.140304	235.985631	238.400701	240.870794	244.046885	248.398403	254.110008	261.063590	269.048053	277.713902	286.793692	296.255802	306.160981	316.436994	327.011994	337.834969	348.847586	360.046262	371.506581	383.344899	395.639797	408.411137	421.613084	435.178271
Benin	BEN	Sub-Saharan Africa	Low income	21.8682778	22.1966655	22.5510731	22.9333540	23.3447677	23.786440	24.257778	24.756917	25.280782	25.827776	26.397410	26.991548	27.613294	28.267222	28.956767	29.684046	30.449087	31.251667	32.090511	32.965280	33.878397	34.832512	35.827856	36.864305	37.943429	39.060890	40.220495	41.440688	42.745796	44.151259	45.667781	47.284525	48.969165	50.675949	52.372810	54.046284	55.708044	57.380853	59.099840	60.889952	62.759250	64.698421	66.695238	68.730082	70.789509	72.870672	74.980428	77.127714	79.325186	81.582645	83.902359	86.282795	88.724619	91.227758	93.791699	96.417763	99.106101	101.853920
Burkina Faso	BFA	Sub-Saharan Africa	Low income	17.8895468	18.1298465	18.3765387	18.6362939	18.9139985	19.211853	19.528578	19.861261	20.205314	20.557748	20.918790	21.290837	21.675742	22.076173	22.494682	22.931422	23.387920	23.869953	24.384708	24.937292	25.530556	26.163213	26.830793	27.526469	28.245274	28.986455	29.751729	30.542050	31.359002	32.204072	33.077792	33.980676	34.914020	35.879342	36.878209	37.912080	38.982259	40.090365	41.237942	42.426689	43.657116	44.930921	46.252270	47.626349	49.056762	50.545234	52.090720	53.690515	55.340271	57.036612	58.778914	60.567420	62.400493	64.276378	66.193801	68.151966	70.150892	72.191283
Botswana	BWA	Sub-Saharan Africa	Upper middle income	0.9046371	0.9242108	0.9452208	0.9667267	0.9881143	1.009235	1.030635	1.053318	1.078644	1.107609	1.140485	1.177090	1.217356	1.261116	1.308127	1.358635	1.412540	1.468895	1.526432	1.584296	1.641713	1.699001	1.757680	1.819983	1.887287	1.960269	2.037842	2.117529	2.195903	2.270492	2.340307	2.406003	2.468742	2.530410	2.592370	2.655109	2.718093	2.780555	2.841325	2.899677	2.954984	3.007856	3.060360	3.115288	3.174489	3.239476	3.309264	3.380162	3.446964	3.506264	3.556194	3.598805	3.639363	3.685377	3.742022	3.811240	3.890967	3.977425
Central African Republic	CAF	Sub-Saharan Africa	Low income	2.4496228	2.4911073	2.5351857	2.5821310	2.6320363	2.685510	2.742146	2.799759	2.855406	2.907227	2.954377	2.998141	3.041595	3.089005	3.143547	3.205583	3.274453	3.351091	3.436349	3.530380	3.634855	3.748648	3.865801	3.978269	4.080659	4.169895	4.248676	4.324333	4.407419	4.505336	4.620548	4.750130	4.889642	5.032288	5.172969	5.310336	5.445497	5.578818	5.711281	5.843570	5.974539	6.103130	6.230025	6.356344	6.482362	6.610275	6.738595	6.859556	6.962703	7.041587	7.092741	7.121280	7.139783	7.165840	7.212382	7.283841	7.377489	7.490412

The Affordable Care Act created a market place for individuals to purchase health care plans. In 2014, the premiums for a 27 year old for the different levels health insurance are given in Table 2.10 (\“Health insurance marketplace,\” 2013). Create a density plot of bronze_lowest, then silver_lowest, and gold_lowest all on the same aces. Use |> or %>% at the end of each command. Describe the story the graphs tells.

Insurance<- read.csv( "https://krkozak.github.io/MAT160/insurance.csv") 
knitr::kable(head(Insurance))

Table 2.10: Head of Insurance Data frame
state	average_QHP	bronze_lowest	silver_lowest	gold_lowest	catastrophic	second_silver_pretax	second_silver_posttax	lowest_bronze_posttax	silver_family_pretax	silver_family_posttax	bronze_family_posttax
AK	34	254	312	401	236	312	107	48	1131	205	0
AL	7	162	200	248	138	209	145	98	757	282	112
AR	28	181	231	263	135	241	145	85	873	282	64
AZ	106	141	164	187	107	166	145	120	600	282	192
DE	19	203	234	282	137	237	145	111	859	282	158
FL	102	169	200	229	132	218	145	96	789	282	104

Code book for Data Frame Insurance

Description The Affordable Care Act created a market place for individuals to purchase health care plans.The data is from 2014.

Format

This data frame contains the following columns:

state: state of insured.

average_QHP: The number of qualified health plans

bronze_lowest: premium for the lowest bronze level of insurance for a single person (\$)

silver_lowest: premium for the lowest silver level of insurance for a single person (\$)

gold_lowest: premium for the lowest gold level of insurance for a single person (\$)

catastrophic: premium for the catastrophic level of insurance for a single person (\$)

second_silver_pretax: premium for the second silver level of insurance for a single person pretax (\$)

second_silver_posttax: premium for the second silver level of insurance for a single person posttax (\$)

second_bronze_posttax: premium for the lowest bronze level of insurance for a single person posttax (\$)

silver_family_pretax: premium for the silver level of insurance for a family pretax (\$)

silver_family_posttax: premium for the silver level of insurance for a family posttax (\$)

bronze_family_posttax: premium for the bronze level of insurance for a family posttax (\$)

Source Health Insurance Market Place Retrieved from website: http://aspe.hhs.gov/health/reports/2013/marketplacepremiums/ib_premiumslandscape.pdf premiums for 2014.

References Department of Health and Human Services, ASPE. (2013). Health insurance marketplace

Students in a statistics class took their first test. In Table 2.11 are the scores they earned. Create a density plot for grades. Describe the shape of the distribution.

Firsttest_1<- read.csv( "https://krkozak.github.io/MAT160/firsttest_1.csv") 
knitr::kable(head(Firsttest_1))

Table 2.11: Head of First Test Data frame
grades
80
79
89
74
73
67

Students in a statistics class took their first test. The scores they earned are in Table 2.12. Create a density plot for grades. Describe the shape of the distribution. Compare to the graph in question 4.

Firsttest_2<- read.csv( "https://krkozak.github.io/MAT160/firsttest_2.csv") 
knitr::kable(head(Firsttest_2))

Table 2.12: Head of First Test Data frame
grades
67
67
76
47
85
70

2.3 Other Graphical Representations of Data

There are many other types of graphs. Some of the more common ones are the point plot (scatter plot), and a time-series plot. There are also many different graphs that have emerged lately for qualitative data. Many are found in publications and websites. The following is a description of the point plot (scatter plot), and the time-series plot.

2.3.1 Point Plots or Scatter Plot

Sometimes you have two different variables and you want to see if they are related in any way. A scatter plot helps you to see what the relationship would look like. A scatter plot is just a plotting of the ordered pairs.

2.3.2 Example: Scatter Plot

Is there a relationship between systolic blood pressure and weight? To answer this question some data is needed. The data frame NHANES contains this data, but given the size of the data frame, it may be not be very useful to look at the graph of all the data. It makes sense to take a sample from the data frame. A random sample is the better type of sample to take. Once the sample is taken, then a scatter plot can be created. The rStudio command for a scatter plot is

gf_point(response_variable ~ explanatory_variable, data= Data_Frame)

The sample is Table 2.13.

2.3.2.1 Solution

sample_NHANES <- NHANES |> 
  sample_n(size = 100) 
knitr::kable(head(sample_NHANES))

Table 2.13: Head of NHANES Sample Data frame
ID	SurveyYr	Gender	Age	AgeDecade	AgeMonths	Race1	Race3	Education	MaritalStatus	HHIncome	HHIncomeMid	Poverty	HomeRooms	HomeOwn	Work	Weight	Length	HeadCirc	Height	BMI	BMICatUnder20yrs	BMI_WHO	Pulse	BPSysAve	BPDiaAve	BPSys1	BPDia1	BPSys2	BPDia2	BPSys3	BPDia3	Testosterone	DirectChol	TotChol	UrineVol1	UrineFlow1	UrineVol2	UrineFlow2	Diabetes	DiabetesAge	HealthGen	DaysPhysHlthBad	DaysMentHlthBad	LittleInterest	Depressed	nPregnancies	nBabies	Age1stBaby	SleepHrsNight	SleepTrouble	PhysActive	PhysActiveDays	TVHrsDay	CompHrsDay	TVHrsDayChild	CompHrsDayChild	Alcohol12PlusYr	AlcoholDay	AlcoholYear	SmokeNow	Smoke100	Smoke100n	SmokeAge	Marijuana	AgeFirstMarij	RegularMarij	AgeRegMarij	HardDrugs	SexEver	SexAge	SexNumPartnLife	SexNumPartYear	SameSex	SexOrientation	PregnantNow
62481	2011_12	female	39	30-39	NA	Hispanic	Hispanic	College Grad	Separated	65000-74999	70000	3.13	5	Rent	Working	80.5	NA	NA	170.5	27.70	NA	25.0_to_29.9	78	95	58	98	56	96	62	94	54	48.93	1.19	3.83	282	1.300	NA	NA	No	NA	Vgood	2	30	None	None	4	3	23	6	No	No	NA	1_hr	3_hr	NA	NA	Yes	2	6	NA	No	Non-Smoker	NA	No	NA	No	NA	No	Yes	20	25	2	No	Heterosexual	No
53990	2009_10	male	45	40-49	551	Hispanic	NA	8th Grade	LivePartner	20000-24999	22500	0.60	7	Rent	Working	92.8	NA	NA	179.2	28.90	NA	25.0_to_29.9	84	115	68	118	66	114	66	116	70	NA	1.19	5.46	29	0.725	NA	NA	No	NA	Excellent	0	0	None	None	NA	NA	NA	6	No	Yes	7	NA	NA	NA	NA	Yes	NA	0	Yes	Yes	Smoker	25	No	NA	No	NA	No	Yes	14	NA	NA	No	Heterosexual	NA
61467	2009_10	male	15	10-19	184	White	NA	NA	NA	5000-9999	7500	0.64	4	Rent	NA	65.4	NA	NA	182.6	19.61	NA	18.5_to_24.9	82	105	66	106	68	106	70	104	62	NA	1.06	3.70	180	0.293	NA	NA	No	NA	Vgood	1	30	NA	NA	NA	NA	NA	NA	NA	No	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
57201	2009_10	female	33	30-39	404	White	NA	Some College	Married	45000-54999	50000	2.89	11	Own	Working	117.3	NA	NA	157.8	47.11	NA	30.0_plus	96	126	54	120	66	122	62	130	46	NA	2.07	4.76	31	0.378	27	0.321	No	NA	Fair	2	2	None	None	4	1	NA	7	No	No	NA	NA	NA	NA	NA	No	NA	NA	NA	No	Non-Smoker	NA	No	NA	No	NA	No	Yes	18	1	1	No	Heterosexual	Yes
66152	2011_12	female	58	50-59	NA	Black	Black	Some College	Widowed	35000-44999	40000	1.06	6	Own	NotWorking	46.6	NA	NA	162.7	17.60	NA	12.0_18.5	86	122	78	124	80	122	78	NA	NA	22.98	2.84	6.00	52	0.091	NA	NA	No	NA	Fair	0	0	None	None	7	6	14	8	No	No	7	More_4_hr	0_hrs	NA	NA	Yes	2	156	Yes	Yes	Smoker	13	No	NA	No	NA	No	Yes	13	6	0	No	Heterosexual	NA
64471	2011_12	female	37	30-39	NA	White	White	College Grad	Married	more 99999	100000	5.00	6	Own	Working	66.8	NA	NA	161.9	25.50	NA	25.0_to_29.9	64	99	72	98	68	96	74	102	70	8.79	2.02	5.97	142	1.449	NA	NA	No	NA	NA	NA	NA	NA	NA	NA	NA	NA	6	No	Yes	NA	1_hr	0_to_1_hr	NA	NA	NA	NA	NA	NA	No	Non-Smoker	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	No

Preliminary: State the explanatory variable and the response variable

Let x=explanatory variable = Weight of a person (Weight)

y=response variable = Systolic blood pressure (BPSys1)

gf_point(BPSys1~Weight, data=sample_NHANES, xlab="Weight (kg)", ylab="Systolic Blood Pressure", title="Blood Pressure versus Weight")

Description of Figure 2.10 is a scatter plot with dots all over the plot though a line could be thought of fitting the dots with lower on the left and higher on the right.

Looking at the graph Figure 2.10, it appears that there is a linear relationship between weight and systolic blood pressure though it looks somewhat weak. It also appears to be a positive relationship, thus as weight increases, the systolic blood pressure increases.

2.3.3 Time-Series

A time-series plot is a graph showing the data measurements in chronological order, the data being quantitative data. For example, a time-series plot is used to show profits over the last 5 years. To create a time-series plot on RStudio, use the command

gf_line(response_variable ~ explanatory_variable, data=Data_Frame)

The purpose of a time-series graph is to look for trends over time. Caution, you must realize that the trend may not continue. Just because you see an increase, doesn’t mean the increase will continue forever. As an example, prior to 2007, many people noticed that housing prices were increasing. The belief at the time was that housing prices would continue to increase. However, the housing bubble burst in 2007, and many houses lost value, and haven’t recovered.

2.3.4 Example: Time-Series Plot

The bank assets (in billions of Australia dollars (AUD)) of the Reserve Bank of Australia (RBA) and other financial organizations for the time period of September 1 1969, through March 1 2019, are contained in table Table 2.14 (Reserve Bank of Australia, 2019). Create a time-series plot of the total assets of Authorized Deposit-taking Institutions (ADIs) and interpret any findings.

Australian<- read.csv( "https://krkozak.github.io/MAT160/Australian_financial.csv") 
knitr::kable(head(Australian))

Table 2.14: Head of Australian Data frame
Date	Day	Assets_RBA	Assets_ADIs_Banks	Assets_ADIs_Building	Assets_ADIs_CU	Assets_ADIs_Total	Assets_RFCs_MM	Assets_RFCs_Finance	Assets_RFCs_Total	Assets_Life.offices	Assets_Life_funds	Assets_Life_Total	Assets_Other_Public_trusts	Assets_Other_Cash_trusts	Assets_Other_Common_funds	Assets_Others_Friendly	Assets_Other_General_insurance	Assets_Other_vehicles	Assets_Unconsolidated
Sep-69	0	2.7	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Dec-69	90	2.9	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Mar-70	180	3.0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Jun-70	270	3.0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Sep-70	360	3.0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Dec-70	450	3.0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

Code book for Data frame Australian

Description The data is a range of economic and financial data produced by the Reserve Bank of Australia and other organizations.

Format

This data frame contains the following columns:

Date: quarters from September 1, 1969, to March 1, 2019

Day: The number of days since September 1, 1969, using 90 days between starts of a quarter. This column is to make it easier to graph in rStudio, and has no other purpose.

Assets_RBA: The assets for the Royal Bank of Australia

Assets_ADIs_Banks: The assets for Authorized Deposit-taking Institutions (ADIs), Banks

Assets_ADIs_Building: The assets for Authorized Deposit-taking Institutions (ADIs), Building societies

Assets_ADIs_CU: The assets for Authorized Deposit-taking Institutions (ADIs), Credit Unions

Assets_ADIs_Total: The assets for Authorized Deposit-taking Institutions (ADIs), total

Assets_RFCs_MM: The assets for Registered Financial Corporations (RFCs), Money Market Corporations

Assets_RFCs_Finance: The assets for Registered Financial Corporations (RFCs), Finance companies and general financiers

Assets_RFCs_Total: The assets for Registered Financial Corporations (RFCs) total

Assets_Life offices: The Assets of Life offices and superannuation funds; Life insurance offices

Assets_Life_funds: The Assets of Life offices and superannuation funds; Superannuation funds

Assets_Life_Total: The Assets of Life offices and superannuation; Total

Assets_Other_Public_trusts: The Assets of Other managed funds; Public unit trusts

Assets_Other_Cash_trusts: The Assets of Other managed funds; Cash management trusts

Assets_Other_Common_funds: The Assets of Other managed funds; Common funds

Assets_Others_Friendly: The Assets of Other managed funds; Friendly societies

Assets_Other_General_insurance: The Assets of Other financial institutions; General insurance offices

Assets_Other_vehicles: The Assets Other financial institutions; Securitisation vehicles

Assets_Unconsolidated: The Assets of Unconsolidated; Statutory funds of life insurance offices; Superannuation

Source Reserve Bank of Australia. (2019, May 13). Statistical Tables. Retrieved July 10, 2019, from https://www.rba.gov.au/statistics/tables/

References Reserve Bank of Australia and other organizations

2.3.4.1 Solution

variable, x=total assets of Authorized Deposit-taking Institutions (ADIs)

Looking at the code book, one can see that the variable Assets_ADIs_Total is the variable in the data frame that is of interest here. With a time series plot, the other variable is time. In this case the variable in the data frame that represents time is Date. The problem with Date is that the units are every quarter. This is not easily interpreted by rStudio, so a column was created called Day. From the code book, this is the number of days since September 1, 1969, using 90 days between starts of a quarter. Even though this isn’t perfect, it will work for determining trends. So create a time series plot of Assets_ADIs_Total versus Day. The command is:

gf_line(Assets_ADIs_Total~Day, data=Australian, title="Total Assets of Authorized Deposit-taking Institutions (ADIs)", xlab="Day since September 1, 1969", ylab="ADI (AUD)")

Description of Figure 2.11 is an increasing time series Graph of Total Assets of Authorized Deposit-taking Institutions from day 7500 to 17500. The first number starts at 0 and goes up to about 4500.

From the graph, total assets of Authorized Deposit-taking Institutions (ADIs) appear to be increasing with a slight dip around 14000 days since September 1, 1969. That would be around the year 2008 (14000 days /360 days per year + 1969).

Be careful when making a graph. If the vertical axis doesn’t start at 0, then the change can look much more dramatic than it really is. For a graph to be useful to the reader, it needs to have a title that explains what the graph contains, the axes should be labeled so the reader knows what each axes represents, each axes should have a scale marked, and it is best if the vertical axis contains 0 to show the relationship.

2.3.5 Homework for Other Graphical Representations of Data Section

When an anthropologist finds skeletal remains, they need to figure out the height of the person. The height of a person (in cm) and the length of one of their metacarpal bone (in cm) were collected and are in Table 2.15 (Prediction of height, 2013). Create a scatter plot of length and height and state if there is a relationship between the height of a person and the length of their metacarpal.

Metacarpal<- read.csv( "https://krkozak.github.io/MAT160/metacarpal.csv") 
knitr::kable(head(Metacarpal))

Table 2.15: Head of Metacarpal Data frame
length	height
45	171
51	178
39	157
41	163
48	172
49	183

Code book for Data frame Metacarpal

Description When anthropologists analyze human skeletal remains, an important piece of information is living stature. Since skeletons are commonly based on statistical methods that utilize measurements on small bones. The following data was presented in a paper in the American Journal of Physical Anthropology to validate one such method.

Format

This data frame contains the following columns:

length: length of Metacarpal I bone in mm

height: stature of skeleton in cm

Source Prediction of Height from Metacarpal Bone Length. (n.d.). Retrieved July 9, 2019, from http://www.statsci.org/data/general/stature.html

References Musgrave, J., and Harneja, N. (1978). The estimation of adult stature from metacarpal bone length. Amer. J. Phys. Anthropology 48, 113-120.

Devore, J., and Peck, R. (1986). Statistics. The Exploration and Analysis of Data. West Publishing, St Paul, Minnesota.

The value of the house and the amount of rental income in a year that the house brings in are in Table 2.16 (Capital and rental 2013). Create a scatter plot and state if there is a relationship between the value of the house and the annual rental income.

House<- read.csv( "https://krkozak.github.io/MAT160/house.csv") 
knitr::kable(head(House))

Table 2.16: Head of House Data frame
capital	rental
61500	6656
67500	6864
75000	4992
75000	7280
76000	6656
77000	4576

Code book for Data frame House

Description The data show the capital value and annual rental value of domestic properties in Auckland in 1991.

Format

This data frame contains the following columns:

Capital: Selling price of house in Australian dollar (AUD)

rental: rental price of a house in Australian dollar (AUD)

Source Capital and rental values of Auckland properties. (2013, September 26). Retrieved from http://www.statsci.org/data/oz/rentcap.html

References Lee, A. (1994) Data Analysis: An introduction based on R. Auckland: Department of Statistics, University of Auckland. Data courtesy of Sage Consultants Ltd.

The World Bank collects information on the life expectancy of a person in each country (\“Life expectancy at,\” 2013) and the fertility rate per woman in the country (\“Fertility rate,\” 2013). The data for countries for the year 2011 are in Table 2.17. Create a scatter plot of the data and state if there appears to be a relationship between life expectancy and the number of births per woman in 2011.

Fertility<- read.csv( "https://krkozak.github.io/MAT160/fertility.csv") 
knitr::kable(head(Fertility))

Table 2.17: Head of Fertility Data frame
country	lifexp_2011	fertilrate_2011	lifexp_2000	fertilrate_2000	lifexp_1990	fertilrate_1990
Macao SAR, China	79.91	1.03	77.62	0.94	75.28	1.69
Hong Kong SAR, China	83.42	1.20	80.88	1.04	77.38	1.27
Singapore	81.89	1.20	78.05	NA	76.03	1.87
Hungary	74.86	1.23	71.25	1.32	69.32	1.84
Korea, Rep.	80.87	1.24	75.86	1.47	71.29	1.59
Romania	74.51	1.25	71.16	1.31	69.74	1.84

Code book for Data frame Fertility

Description Data is from the World Bank on the life expectancy of countries and the fertility rates in those countries.

Format

This data frame contains the following columns:

Country: Countries in the World

lifexp_2011: Life expectancy of a person born in 2011

fertilrate_2011: Fertility rate in the country in 2011

lifexp_2000: Life expectancy of a person born in 2000

fertilrate_2000: Fertility rate in the country in 2000

lifexp_1990: Life expectancy of a person born in 1990

fertilrate_1990: Fertility rate in the country in 1990

Source Life expectancy at birth. (2013, October 14). Retrieved from http://data.worldbank.org/indicator/SP.DYN.LE00.IN

References Data from World Bank, Life expectancy at birth, total (years)

The World Bank collected data on the percentage of gross domestic product (GDP) that a country spends on health expenditures (Current health expenditure (% of GDP), 2019), the fertility rate of the country (Fertility rate, total (births per woman), 2019), and the percentage of women receiving prenatal care (Pregnant women receiving prenatal care (%), 2019). The data for the countries where this information is available in Table 2.18. Create a scatter plot of the health expenditure and percentage of women receiving prenatal care in the year 2000, and state if there appears to be a relationship between percentage spent on health expenditure and the percentage of women receiving prenatal care.

Fert_prenatal<-read.csv( "https://krkozak.github.io/MAT160/fertility_prenatal.csv") 
knitr::kable(head(Fert_prenatal))

Table 2.18: Head of Fert_prenatal Data frame
Country.Name	Country.Code	Region	IncomeGroup	f1960	f1961	f1962	f1963	f1964	f1965	f1966	f1967	f1968	f1969	f1970	f1971	f1972	f1973	f1974	f1975	f1976	f1977	f1978	f1979	f1980	f1981	f1982	f1983	f1984	f1985	f1986	f1987	f1988	f1989	f1990	f1991	f1992	f1993	f1994	f1995	f1996	f1997	f1998	f1999	f2000	f2001	f2002	f2003	f2004	f2005	f2006	f2007	f2008	f2009	f2010	f2011	f2012	f2013	f2014	f2015	f2016	f2017	p1986	p1987	p1988	p1989	p1990	p1991	p1992	p1993	p1994	p1995	p1996	p1997	p1998	p1999	p2000	p2001	p2002	p2003	p2004	p2005	p2006	p2007	p2008	p2009	p2010	p2011	p2012	p2013	p2014	p2015	p2016	p2017	p2018	e2000	e2001	e2002	e2003	e2004	e2005	e2006	e2007	e2008	e2009	e2010	e2011	e2012	e2013	e2014	e2015	e2016
Angola	AGO	Sub-Saharan Africa	Lower middle income	7.478	7.524	7.563	7.592	7.611	7.619	7.618	7.613	7.608	7.604	7.601	7.603	7.606	7.611	7.614	7.615	7.609	7.594	7.571	7.540	7.504	7.469	7.438	7.413	7.394	7.380	7.366	7.349	7.324	7.291	7.247	7.193	7.130	7.063	6.992	6.922	6.854	6.791	6.734	6.683	6.639	6.602	6.568	6.536	6.502	6.465	6.420	6.368	6.307	6.238	6.162	6.082	6.000	5.920	5.841	5.766	5.694	5.623	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	65.6	NA	NA	NA	NA	NA	79.8	NA	NA	NA	NA	NA	NA	NA	NA	81.6	NA	NA	2.334435	5.483823	4.072288	4.454100	4.757211	3.734836	3.366183	3.211438	3.495036	3.578677	2.736684	2.840603	2.692890	2.990929	2.798719	2.950431	2.877825
Armenia	ARM	Europe & Central Asia	Upper middle income	4.786	4.670	4.521	4.345	4.150	3.950	3.758	3.582	3.429	3.302	3.199	3.114	3.035	2.956	2.875	2.792	2.712	2.641	2.582	2.538	2.510	2.499	2.503	2.517	2.538	2.559	2.578	2.591	2.592	2.578	2.544	2.484	2.400	2.297	2.179	2.056	1.938	1.832	1.747	1.685	1.648	1.635	1.637	1.648	1.665	1.681	1.694	1.702	1.706	1.703	1.693	1.680	1.664	1.648	1.634	1.622	1.612	1.604	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	82	NA	NA	92.4	NA	NA	NA	NA	93.0	NA	NA	NA	NA	99.1	NA	NA	NA	NA	NA	99.6	NA	NA	6.505224	6.536263	5.690812	5.610725	8.227844	7.034880	5.588461	5.445144	4.346749	4.689046	5.264181	3.777260	6.711859	8.269840	10.178299	10.117627	9.927321
Belize	BLZ	Latin America & Caribbean	Upper middle income	6.500	6.480	6.460	6.440	6.420	6.400	6.379	6.358	6.337	6.316	6.299	6.288	6.284	6.285	6.287	6.278	6.250	6.195	6.109	5.992	5.849	5.684	5.510	5.336	5.170	5.019	4.886	4.771	4.671	4.584	4.508	4.436	4.363	4.286	4.201	4.109	4.010	3.908	3.805	3.703	3.600	3.496	3.390	3.282	3.175	3.072	2.977	2.893	2.821	2.762	2.715	2.676	2.642	2.610	2.578	2.544	2.510	2.475	NA	NA	NA	NA	NA	96	NA	NA	NA	NA	NA	NA	98	95.9	100.0	NA	98	NA	NA	94.0	94.0	99.2	NA	NA	NA	96.2	NA	NA	NA	97.2	97.2	NA	NA	3.942030	4.228792	3.864327	4.260178	4.091610	4.216728	4.163924	4.568384	4.646109	5.311070	5.764874	5.575126	5.322589	5.727331	5.652458	5.884248	6.121374
Cote d’Ivoire	CIV	Sub-Saharan Africa	Lower middle income	7.691	7.720	7.750	7.781	7.811	7.841	7.868	7.893	7.912	7.927	7.936	7.941	7.942	7.939	7.929	7.910	7.877	7.828	7.763	7.682	7.590	7.488	7.383	7.278	7.176	7.078	6.984	6.892	6.801	6.710	6.622	6.536	6.454	6.374	6.298	6.224	6.152	6.079	6.006	5.932	5.859	5.787	5.717	5.651	5.589	5.531	5.476	5.423	5.372	5.321	5.269	5.216	5.160	5.101	5.039	4.976	4.911	4.846	NA	NA	NA	NA	NA	NA	NA	NA	83.2	NA	NA	NA	NA	84.3	87.6	NA	NA	NA	NA	87.3	84.8	NA	NA	NA	NA	NA	90.6	NA	NA	NA	93.2	NA	NA	5.672228	4.850694	4.476869	4.645306	5.213588	5.353556	5.808850	6.259154	6.121605	6.223329	6.146566	5.978840	6.019660	5.074942	5.043462	5.262711	4.403621
Ethiopia	ETH	Sub-Saharan Africa	Low income	6.880	6.877	6.875	6.872	6.867	6.864	6.867	6.880	6.903	6.937	6.978	7.020	7.060	7.094	7.121	7.143	7.167	7.195	7.230	7.271	7.316	7.360	7.397	7.424	7.437	7.435	7.418	7.387	7.347	7.298	7.246	7.193	7.143	7.094	7.046	6.995	6.935	6.861	6.769	6.659	6.529	6.380	6.216	6.044	5.867	5.690	5.519	5.355	5.201	5.057	4.924	4.798	4.677	4.556	4.437	4.317	4.198	4.081	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	26.7	NA	NA	NA	NA	27.6	NA	NA	NA	NA	NA	33.9	NA	NA	41.2	NA	62.4	NA	NA	4.365290	4.713670	4.705820	4.885341	4.304562	4.100981	4.226696	4.801925	4.280639	4.412473	5.466372	4.468978	4.539596	4.075065	4.033651	3.975932	3.974016
Guinea	GIN	Sub-Saharan Africa	Low income	6.114	6.127	6.138	6.147	6.154	6.160	6.168	6.177	6.189	6.205	6.225	6.249	6.277	6.306	6.337	6.369	6.402	6.436	6.468	6.500	6.529	6.557	6.581	6.602	6.619	6.631	6.637	6.637	6.631	6.618	6.598	6.570	6.535	6.493	6.444	6.391	6.334	6.273	6.211	6.147	6.082	6.015	5.947	5.877	5.804	5.729	5.653	5.575	5.496	5.417	5.336	5.256	5.175	5.094	5.014	4.934	4.855	4.777	NA	NA	NA	NA	NA	NA	57.6	NA	NA	NA	NA	NA	NA	70.7	NA	NA	NA	84.3	NA	82.2	NA	88.4	NA	NA	NA	NA	85.2	NA	NA	NA	84.3	NA	NA	3.697726	3.884610	4.384152	3.651081	3.365547	2.949490	2.960601	3.013074	2.762090	2.936868	3.067742	3.789550	3.503983	3.461137	4.780977	5.827122	5.478273

Code book for Data frame Fert_prenatal

Description Data is from the World Bank on money spent on expenditure of countries and the percentage of women receiving prenatal care in those countries.

Format

This data frame contains the following columns:

Country.Name: Countries around the world

Country.Code: Three letter country code for countries around the world

Region: Location of a country around the world as classified by the World Bank

IncomeGroup: The income level of a country as classified by the World Bank

f1960-f2017: Fertility rate of a country from 1960-2017

p1986-p2018: Percentage of women receiving prenatal care in the country in 1986-2018

e200-2016: Expenditure amounts of the countries for medical care in 2000-2016 (% of GDP)

Source Fertility rate, total (births per woman). (n.d.). Retrieved July 8, 2019, from https://data.worldbank.org/indicator/SP.DYN.TFRT.IN Pregnant women receiving prenatal care (%). (n.d.). Retrieved July 9, 2019, from https://data.worldbank.org/indicator/SH.STA.ANVC.ZS Current health expenditure (% of GDP). (n.d.). Retrieved July 9, 2019, from https://data.worldbank.org/indicator/SH.XPD.CHEX.GD.ZS

References Data from World Bank, fertility rate, expenditure on health, and pregnant woman rate of prenatal care.

The Australian Institute of Criminology gathered data on the number of deaths (per 100,000 people) due to firearms during the period 1983 to 1997 (\“Deaths from firearms,\” 2013). The data is in Table 2.19. Create a time-series plot of the data and state any findings you can from the graph.

Firearm<- read.csv( "https://krkozak.github.io/MAT160/rate.csv") 
knitr::kable(head(Firearm))

Table 2.19: Head of Firearm Data frame
year	rate
1983	4.31
1984	4.42
1985	4.52
1986	4.35
1987	4.39
1988	4.21

Code book for Data Frame Firearm

Description The data give the number of deaths caused by firearms in Australia from 1983 to 1997, expressed as a rate per 100,000 of population.

Format

This data frame contains the following columns:

Year: Years from 1983 to 1997

Rate: Rate of deaths caused by firearms in Australia per 100,000 population

Source Deaths from firearms. (2013, September 26). Retrieved from http://www.statsci.org/data/oz/firearms.html

References Australian Institute of Criminology, 1999.The data was contributed by Rex Boggs, Glenmore State High School, Rockhampton, Queensland, Australia.

The economic crisis of 2008 affected many countries, though some more than others. Some people in Australia have claimed that Australia wasn’t hurt that badly from the crisis. The bank assets (in billions of Australia dollars (AUD)) of the Reserve Bank of Australia (RBA) for the time period of September 1 1969, through March 1 2019, are contained in @bl-Australian (Reserve Bank of Australia, 2019). Create a time-series plot of the assets of the RBA and interpret any findings.

Code book for Data Frame Australian is below Table 2.14.

The consumer price index (CPI) is a measure used by the U.S. government to describe the cost of living. The cost of living for the U.S. from the years 1913 through 2019, with the year 1982 being used as the year that all others are compared (Consumer Price Index Data from 1913 to 2019, 2019) is given in Table 2.20. Create a time-series plot of the Average Annual CPI and interpret.

CPI<- read.csv( "https://krkozak.github.io/MAT160/CPI_US.csv") 
knitr::kable(head(CPI))

Table 2.20: Head of CPI Data frame
Year	Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec	Annual_avg	PerDec_Dec	Perc_Avg_Avg
1913	9.8	9.8	9.8	9.8	9.7	9.8	9.9	9.9	10.0	10.0	10.1	10.0	9.9	–	–
1914	10.0	9.9	9.9	9.8	9.9	9.9	10.0	10.2	10.2	10.1	10.2	10.1	10.0	1	1
1915	10.1	10.0	9.9	10.0	10.1	10.1	10.1	10.1	10.1	10.2	10.3	10.3	10.1	2	1
1916	10.4	10.4	10.5	10.6	10.7	10.8	10.8	10.9	11.1	11.3	11.5	11.6	10.9	12.6	7.9
1917	11.7	12.0	12.0	12.6	12.8	13.0	12.8	13.0	13.3	13.5	13.5	13.7	12.8	18.1	17.4
1918	14.0	14.1	14.0	14.2	14.5	14.7	15.1	15.4	15.7	16.0	16.3	16.5	15.1	20.4	18

Code book for Data frame CPI

Description This table of Consumer Price Index (CPI) data is based upon a 1982 base of 100.

Format

This data frame contains the following columns:

Year: Year from 1913 to 2019

Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec: CPI for a particular month

Average_Avg: The average CPI for a particular year

PerDec_Dec: Percent change from December to December

Per_Avg_Avg: Percent change from Annual Average to Annual Average

Source Consumer Price Index Data from 1913 to 2019. (2019, June 12). Retrieved July 10, 2019, from https://www.usinflationcalculator.com/inflation/consumer-price-index-and-annual-percent-changes-from-1913-to-2008/

References US Inflation Calculator website, 2019.

The mean and median incomes income in current dollars is given in Table 2.21. Create a time-series plot and interpret.

US_income<- read.csv( "https://krkozak.github.io/MAT160/US_income.csv") 
knitr::kable(head(US_income))

Table 2.21: Head of US_income Data frame
year	number	med_income_current	med_income_2017	mean_income_current	mean_income_2017
2017	127586	61372	61372	86220	86220
2016	126224	59039	60309	83143	84931
2015	125819	56516	58476	79263	82012
2014	124587	53657	55613	75738	78500
2013	122952	51939	54744	72641	76565
2012	122459	51017	54569	71274	76237

Code book for Data Frame US_income

Description This table is of US mean and median incomes in both current dollars and in 2017 dollars.

Format

This data frame contains the following columns:

Year: Year from 1975 to 2017

number: Households as of March of the following year. (in thousands)

med_income_current: median income of a US household in current dollars

med_income_2017: median income of a US household in 2017 CPI-U-RS adjusted dollars

mean_income_current: mean income of a US household in current dollars

mean_income_2017: mean income of a US household in 2017 CPI-U-RS adjusted dollars

Source US Census Bureau. (2018, March 06). Data. Retrieved July 21, 2019, from https://www.census.gov/programs-surveys/cps/data-detail.html

References U.S. Census Bureau, Current Population Survey, Annual Social and Economic Supplements.